Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition

نویسندگان

Pavel Krbec

Petr Podveský

Jan Hajic

چکیده

A speech recognition system targeting high inflective languages is described that combines the traditional trigram language model and an HMM tagger, obtaining results superior to the trigram language model itself. An experiment in speech recognition of Czech has been performed with promising results. 1. Speech Recognition of Inflective Languages Inflective languages pose a hard problem in speech recognition due to two phenomena: highly inflective nature (causing data sparseness problem and excessive vocabulary growth), and free word order (causing the traditional speech recognition systems, such as n-gram Hidden Markov Models (HMMs) on word forms to be less accurate than for English). Specific methods targeting speech recognition of inflective languages have been already introduced in [1], [2] and [3]. The authors mainly focus on improving the language model by decomposing words from the vocabulary into stems and endings. This approach has mainly helped in reducing the size of the vocabulary of the speech recognizer reducing the WER slightly. 2. Combining Taggers with Language Models Tagger has been to our best knowledge first introduced as a speech recognition language model component in [4] without improving results over the baseline bigram model. The idea has been further explored in [5] where the author proposes the interpolation with a trigram model. P (W ) = λP (wi|wi−2, wi−1)+ (1 − λ)Q(wi|g(wi−2), g(wi−1)), (1) where g(wi) is the tagging function. The importance of formula (1) for languages with the data sparseness problem is that the new component Q can have enough evidence to give us reliable statistics about the word sequence W as the size of tag set tends to be much smaller then the size of the vocabulary itself. The problem with approach (1) is that the tagging function g(wi) depends on all words of the utterance (supposing that the tagging component is performed by an HMM tagger). The standard solution is to replace the probability Q by a new probability Q∗: Q∗(wi|w1, . . . , wi−1) = ∑ Q(wi|g1, g2)T (g(wi−2) = g2, g(wi−1) = g1) (2) The new probability T is the corresponding forward probability of the HMM tagger. The calculation of the forward probabilities is an seque words we do stand becom where proba are th that e inform us som the m the lis

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Learning Representations for Weakly Supervised Natural Language Processing Tasks

Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on partof-speech tagging and...

متن کامل

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition

نویسندگان

چکیده

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Learning Representations for Weakly Supervised Natural Language Processing Tasks

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

عنوان ژورنال:

اشتراک گذاری